Parse and Corpus-Based Machine Translation

نویسندگان

  • Vincent Vandeghinste
  • Scott Martens
  • Gideon Kotzé
  • Jörg Tiedemann
  • Joachim Van den Bogaert
  • Koen De Smet
  • Frank Van Eynde
  • Gertjan van Noord
چکیده

The current state-of-the-art in machine translation consists of phrase-based statistical machine translation (PB-SMT) [23], an approach which has been used since the late 1990s, evolving from word-based SMT proposed by IBM [5]. These stringbased techniques (which use no linguistic knowledge) seem to have reached their ceiling in terms of translation quality, while there are still a number of limitations to the model. It lacks a mechanism to deal with long-distance dependencies, it has no means to generalise over non-overt linguistic information [37] and it has limited word reordering capabilities. Furthermore, in some cases the output quality may lack appropriate fluency and grammaticality to be acceptable for actual MT users. Sometimes essential words are missing from the translation. To overcome these limitations efforts have been made to introduce syntactic knowledge into the statistical paradigm, usually in the form of syntax trees, either

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning to Parse Bilingual Sentences Using Bilingual Corpus and Monolingual CFG

Abstract We present a new method for learning to parse a bilingual sentence using Inversion Transduction Grammar trained on a parallel corpus and a monolingual treebank. The method produces a parse tree for a bilingual sentence, showing the shared syntactic structures of individual sentence and the differences of word order within a syntactic structure. The method involves estimating lexical tr...

متن کامل

Fast and Accurate Preordering for SMT using Neural Networks

We propose the use of neural networks to model source-side preordering for faster and better statistical machine translation. The neural network trains a logistic regression model to predict whether two sibling nodes of the source-side parse tree should be swapped in order to obtain a more monotonic parallel corpus, based on samples extracted from the word-aligned parallel corpus. For multiple ...

متن کامل

Pre-Reordering for Machine Translation Using Transition-Based Walks on Dependency Parse Trees

We propose a pre-reordering scheme to improve the quality of machine translation by permuting the words of a source sentence to a target-like order. This is accomplished as a transition-based system that walks on the dependency parse tree of the sentence and emits words in target-like order, driven by a classifier trained on a parallel corpus. Our system is capable of generating arbitrary permu...

متن کامل

Syntax Augmented Machine Translation via Chart Parsing with Integrated Language Modeling

We present a hierarchical phrase-based translation model which annotates and generalizes existing phrase translations with syntactic categories derived from parsing the target side of a parallel corpus. We associate target parse trees for each training sentence pair with a search lattice constructed from the existing phrase translations on the corresponding source sentence, and consider techniq...

متن کامل

Syntax Augmented Machine Translation via Chart Parsing with Integrated Language Modeling

We present a hierarchical phrase-based translation model which annotates and generalizes existing phrase translations with syntactic categories derived from parsing the target side of a parallel corpus. We associate target parse trees for each training sentence pair with a search lattice constructed from the existing phrase translations on the corresponding source sentence, and consider techniq...

متن کامل

A Method of Automatically Adapting an MT System to Different Domains

In order to achieve high translation quality for existing documents in a special domain using conventional MT systems, a domain adaptive translation method based on bilingual corpora has been proposed. In this method, source sentences in a bilingual corpus are translated by the MT system and the results are compared with the target expressions in the corpus. The identifying parse trees of the m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013